Journal of Open Source Software — Latest Matching Preprints

1

golgi: open-source software for automated nerve model generation and recruitment simulation

Lung, D.; Jia, Y.; Moro, A.; Fachino, M.; Haberbusch, M.

2026-07-13 bioengineering 10.64898/2026.07.10.737846 medRxiv

Top 0.1%

12.2%

Show abstract

golgi is an open-source platform that takes a peripheral nerve from image to stimulated fiber population through a single graphical interface, with an equivalent scriptable Python API and command-line interface for batch and high-performance use. It integrates promptable image segmentation, automated multi-region tetrahedral meshing, anisotropic finite-element solution of the extracellular field with an explicit perineurium contact impedance, generation of realistic fiber populations and their three-dimensional trajectories, and biophysical activation thresholds through interchangeable backends-- NEURON (via PyFibers) and a GPU-accelerated surrogate (AxonML). Every study exports as an integrity-hashed bundle whose image-to-recruitment provenance is verifiable byte-for-byte. golgi lowers the barrier to in-silico peripheral nerve stimulation modeling for experimentalists and clinicians, using a fully open finite-element stack with no commercial dependencies.

2

CICADA: A unified framework for NWB-based neurophysiological data analysis

Hamon, M.; Lebert, J.; Denis, J.; Filippi, C.; Renard, A.; Bech, P.; Pulin, M.; Bisi, A.; Molinuevo Gomez, D.; Priestley, J. B.; Crochet, S.; Petersen, C. C.; Cossart, R.; Picardo, M. A.; Dard, R. F.

2026-07-08 neuroscience 10.64898/2026.07.03.736318 medRxiv

Top 0.1%

5.3%

Show abstract

Neurophysiology datasets are becoming increasingly complex, combining behavioral measurements with high-dimensional neuronal activity recordings coming from optical and/or electrophysiological measurements. The Neurodata Without Borders (NWB) standard has emerged in the community as the format of record. While standardized and widely used preprocessing tools generating NWB files have been developed, extensible frameworks for scientific analysis downstream of the NWB ecosystem are still under-represented. We present CICADA, a Python framework dedicated to analysis of neurophysiological data in the standardized NWB format. The toolbox is built as three hierarchically-organized packages: cicada-nwb (NWB access layer), cicada-analysis (plugin-based analysis engine and tool library), and cicada-gui (PyQt5 desktop application at the head of the pipeline). Beyond this architectural separation, CICADA is built around a central design principle: supporting a continuum from turnkey use to full modularity. Researchers can use the complete GUI-driven cicada-gui workflow without writing code, programmatically use existing analysis plugins from cicada-analysis, contribute to new analysis plugins, reuse utilities from cicada-tools, or build entirely custom pipelines on top of the cicada-nwb access layer alone. The same analysis plugin runs identically in interactive GUI and parameter-configured headless modes, enabling reproducible multi-session, multi-animal group analyses. We illustrate the versatility of CICADA with example analyses of behavioral, calcium imaging (two-photon and widefield) and extracellular electrophysiology datasets from rodent laboratories. CICADA is open source, actively maintained, and designed so that any laboratory can contribute at any level of the stack without modifying the core framework.

3

OpenEvo: An Open-Source Platform for Automated Evolution and Analysis

Cocioba, S. S.; Huang, P.-C.; Mallon, J.; Chan, Z.; Geremew, A. W.; Bisson, A.; Kyriakakis, P.

2026-07-07 bioengineering 10.64898/2026.07.06.735356 medRxiv

Top 0.1%

3.3%

Show abstract

Here we introduce OpenEvo, a fully open-source, low-cost turbidostat platform for automated continuous culture and directed evolution experiments. Existing tools are expensive, complex, or lack open-source hardware; OpenEvo addresses this gap. OpenEvo is a complete, fully automated evolution platform with detailed, illustrated construction instructions for beginners, open-source software and firmware, and a single device priced around $300. An optional PC-based version offers enhanced functionality, including remote access, programmable evolution cycles, programmable LED stimulation, and a data visualization tool. OpenEvo can cycle through three types of media for positive, negative, and neutral selection conditions, supporting a wide range of experimental designs. We validate the use of OpenEvo by evolving H. volcanii to grow from 15% to 12% salt over ~150 cycles, ~1,000 hours. Evolved cells grew 36% faster than wild-type at 12% salt. Whole-genome sequencing of adapted cells found SNPs and large deletions. We also demonstrate positive and negative selection using the OpenEvo LEDs to drive optogenetics via a Phytochrome B-based optogenetic tool, with light as the selection stimulus during over 4000 hours of growth. OpenEvo lowers the technical and cost barriers for continuous evolution experiments, serves as a teaching tool, and is designed to grow an open community of users who share modifications.

4

Binary search and and set operations on compacted k-mer lists

Dufresne, Y.; Andreace, F.

2026-07-03 bioinformatics 10.64898/2026.06.29.735436 medRxiv

Top 0.2%

1.9%

Show abstract

Sorted lists of elements are particularly good for computing set operations. A single scan of the two lists is sufficient to materialize or count the results of the union, intersection, difference, and xor operators. In bioinformatics, only a few tools are designed to perform these operations on k-mers. A fast tool like KMC allows set operations at the cost of storing individual k-mers. In this paper, we introduce a novel way to represent sorted k-mers as a collection of recomposed super-k-mer sorted lists. We introduce the concept of virtual super-k-mer and show how to construct, query and perform set operations on sorted lists of virtual super-k-mers. In the implementation sklib, we demonstrate high throughput of the data structure for construction and set operations, while remaining competitive in query capabilities, within a controlled memory footprint (2-5x decrease in bits/element compared to KMC).

5

Bamsnap-LRS: an automated batch visualization tool for long-read sequencing alignments

Chen, W.; Yang, C.; Qiu, L.; Hu, J.; Zhou, Y.

2026-06-25 bioinformatics 10.64898/2026.06.21.733121 medRxiv

Top 0.2%

1.7%

Show abstract

Summary: Long-read sequencing (LRS) has become essential for genome assembly, structural variations (SVs) detection, haplotype phasing and transcript isoform characterization. However, these applications often require manual inspection of read alignment for validation. Existing visualization tools are either interactive genome browsers that are difficult to scale to large datasets or batch-oriented tools that are not optimized for the unique alignment patterns of long-read data. We developed Bamsnap-LRS, an automated command-line tool for high-throughput LRS alignment visualization. It supports long-read-specific features, phased SNP inspection, and publication-ready batch figure generation within a unified framework for genomic, transcriptomic, and haplotype-aware analyses. Availability and Implementation: All codes and examples are freely available at https://github.com/comery/Bamsnap-LRS.

6

CoalMiner: a coalescent model generator for fastsimcoal2

Esplin-Stout, R.; Sethuraman, A.

2026-06-30 evolutionary biology 10.64898/2026.06.25.734618 medRxiv

Top 0.2%

1.7%

Show abstract

Demographic inference using the Site Frequency Spectrum (SFS) is often constrained by the number and complexity of models tested. Here we present a coalescent model generator called CoalMiner for use with fastsimcoal2. CoalMiner utilizes a decision tree framework to generate biologically plausible models, with user input dictating the number and ranges of demographic parameters and histories, which can then be plugged into the fastsimcoal2 pipeline. Using extensive simulations and empirical data, we show that CoalMiner is an effective helper tool to explore demographic model space. CoalMiner is written in Python and is freely available on GitHub: https://github.com/raywray/coalminer with numerous vignettes and tutorials.

7

golgi: an open-source graphical platform for image-to-recruitment modeling of peripheral nerve stimulation

Lung, D.; Jia, Y.; Blumer, R.; Reissig, L.; Zopf, L. M.; Heimel, P.; Kraus, C.; Moro, A.; Fachino, M.; Haberbusch, M.

2026-07-13 bioengineering 10.64898/2026.07.10.737529 medRxiv

Top 0.2%

1.7%

Show abstract

Computational models of peripheral nerve stimulation--coupling finite-element bioelectric fields to biophysical axon models--have become essential for designing electrodes and waveforms for neuromodulation therapies. Yet the established open tools are code-first and assume substantial modeling expertise, and several depend on commercial finite-element solvers, placing realistic nerve modeling out of reach for many experimentalists and clinicians. We present golgi, an open-source platform that takes a peripheral nerve from image to stimulated fiber population through a single graphical interface, with an equivalent scriptable Python API and command-line interface for batch studies. golgi integrates the full pipeline: image segmentation (or import of surfaces or masks), automated multi-region tetrahedral meshing, anisotropic finite-element solution of the extracellular field with explicit perineurium contact impedance, generation of realistic fiber populations and their three-dimensional trajectories (straight, or curved via a quasi-static streamline solver), and biophysical activation thresholds through interchangeable NEURON and a GPU-accelerated surrogate backend. We demonstrate golgi on extruded multifascicular swine and human cervical vagus nerves and on real three-dimensional, branching human and rabbit vagus nerves reconstructed from micro-computed tomography. It reproduces the physiological fiber-diameter recruitment order, quantifies fascicular selectivity and current steering with a multi-contact cuff electrode, and resolves anatomically defined nerve branches. Using this branch resolution, we find that selectively engaging a vagal cardiac branch from a proximal cuff depends on anatomy. In the rabbit, whose cardiac fibers are predominantly small and whose superior cardiac branch forms a discrete, spatially segregated tract, current steering isolates even its small cardiac (B-type) fibers; in the human cervical vagus only the large myelinated fibers are separable, because the high thresholds of the small cardiac fibers force stimulus currents that also recruit off-target fibers. Every study can be exported as an integrity-hashed, self-contained bundle whose finite-element-to-recruitment provenance is verifiable byte-for-byte with a single command--a reproducibility guarantee absent from existing tools. By combining non-specialist usability, anatomical realism, and verifiable reproducibility in one open package, golgi lowers the barrier to in-silico peripheral nerve stimulation modeling. golgi is freely available as open-source software. Author summaryElectrical stimulation of peripheral nerves treats a growing range of conditions, from epilepsy to inflammatory and cardiovascular disease. Deciding where to place an electrode and how to shape the stimulus increasingly relies on computer models that combine the electric field around the electrode with detailed models of how individual nerve fibers respond. We found that existing software for this is powerful but primarily designed for expert modelers: it generally requires programming, substantial modeling expertise, and sometimes expensive commercial software, which can limit its adoption by experimentalists and clinicians. We built golgi to remove that barrier. With golgi, a user can go from a nerve image all the way to predicted fiber recruitment through a single point-and-click interface, while advanced users keep full scripting control. golgi builds anatomically realistic nerve models, simulates how different fiber types and fascicles are recruited, and lets users compare electrode designs. Using golgi, we also found that whether a small but clinically important nerve branch--such as the cardiac branch of the vagus nerve--can be selectively stimulated depends on its anatomy. In a rabbit nerve, where this branch forms a discrete, spatially separated bundle and its fibers are mostly small, even its small fibers can be targeted from a cuff on the main trunk; in the human, only the large fibers can be reached selectively, because the small cardiac fibers are harder to excite and the stronger currents needed to reach them also activate off-target fibers. Critically, every result can be packaged so that anyone else can verify it reproduces exactly--making peripheral nerve stimulation models easier to build, share, and trust.

8

GBZ-base and GAF-base: Indexed pangenome file formats

Siren, J.; Paten, B.; the Human Pangenome Reference Consortium,

2026-07-11 bioinformatics 10.64898/2026.07.10.737775 medRxiv

Top 0.2%

1.4%

Show abstract

MotivationExisting pangenome file formats are designed for batch processing. Graphs must be loaded into memory, and alignment files must be read sequentially. Indexed file formats that can be used directly from disk would be more appropriate for interactive applications. ResultsWe propose GBZ-base and GAF-base -- SQLite-backed file formats comparable to GBZ and GAF. GBZ-base supports efficient extraction of local subgraphs, and GAF-base lets us extract all alignments to the subgraph. Additionally, GAF-base is smaller than any other file format for sequence-to-graph alignments. Availability and implementationFrom https://github.com/jltsiren/gbz-base and https://crates.io/crates/gbz-base under the MIT license.

9

Simulating population pangenomes under coalescent demographic models with MSpangenome

Piat, L.; Denni, S.; Dubois, S.; Linard, B.; Duvaux, L.

2026-07-03 bioinformatics 10.64898/2026.06.29.735168 medRxiv

Top 0.3%

1.1%

Show abstract

Motivation: Pangenome variation graphs (PVGs) are increasingly used to represent genomic diversity, yet there is currently no general framework for generating population pangenomes directly from explicit evolutionary histories. Existing simulators typically focus on individual classes of variation and do not integrate these variations within a genealogy-aware framework driven by explicit demographic histories. As a result, evaluating pangenome methods in realistic population-genetic settings remains challenging, and benchmark datasets with known evolutionary ground truth are scarce. Results: We present MSpangenome, a genealogy-aware frame- work that bridges coalescent population genetic simulations and pangenome graph analyses. The pipeline combines ancestry simulation with msprime and a de novo graph construction algorithm to generate PVGs directly from simulated genealogies. By explicitly modeling recombination, demographic history and incomplete lineage sorting, MSpangenome produces structurally complex pangenomes in which nested and overlapping structural variants emerge naturally from the underlying genealogies, while their evolutionary history and graph topology remain known by construction. This provides a general framework for generating realistic population pangenomes and establishing ground-truth datasets for methodological evaluation. We demonstrate its utility by generating population-scale pangenomes and using them as controlled references to benchmark the widely used graph construction tools, PGGB and Minigraph-Cactus. Our analyses reveal contrasting performance regimes across levels of sequence diversity, sample sizes and classes of structural variation, highlighting the value of simulation-based benchmarking for identifying reconstruction errors that are hard to detect using empirical datasets alone. Availability and implementation: MSpangenome is imple- mented in Python, fully containerized, freely available at https://forge.inrae.fr/pangepop/MSpangepop and mirrored at https://github.com/inrae/MSpangepop.

10

PARS: an automated, open-source pipeline for subject-specific finite element head modelling from MRI

Darvishi, V.; Chan, E. Y. K.; Duckworth, H.; Parker, T. D.; Sharp, D. J.; Ghajari, M.

2026-07-06 bioengineering 10.64898/2026.07.05.736584 medRxiv

Top 0.3%

1.0%

Show abstract

Converting medical images into anatomically detailed, subject-specific finite element (FE) models is a long-standing bottleneck in brain computational modelling. These models are used to predict brain tissue deformation, e.g. in traumatic brain injury, particle diffusion in brain drug delivery, and other biophysical phenomena across neurological disorders. However, existing model creation workflows depend on manual image segmentation, proprietary meshing software, and labour-intensive repair of meningeal and interface structures, limiting reproducibility and cohort analysis. Here we present PARS, a fully automated, open-source pipeline that converts a T1-weighted MRI scan into a simulation-ready FE head model. PARS combines anatomical parcellation with tissue maps and uses iterative neighbourhood-based reclassification, yielding a gap-free whole-head label volume. The volume is directly converted into a hexahedral mesh, augmented with algorithmically reconstructed falx, tentorium, pia and dura mater, and refined by Laplacian smoothing under a node-locking scheme that controls element quality and the explicit-solver stable timestep. We evaluated PARS on 23 subjects spanning cranial volumes of 832 to 1,329 cubic centimeter, at 1.0, 1.5 and 2.0 mm MRI resolutions. At 1 mm, meshes achieved a median Scaled Jacobian of 0.976, and total intracranial volume error of ~0.54; quality remained high at 1.5 mm (SJ of 0.933) and 2 mm (SJ of 0.921). Model creation runtime ranged from 9 to 38 minutes per subject. Models generated by PARS have been validated against cadaveric brain displacement data and demonstrated utility across traumatic brain injury and normal pressure hydrocephalus research. PARS provides an open-access, reproducible resource that substantially lowers the barriers to subject-specific brain modelling.

11

pylimma: a faithful, AnnData-native Python port of R limma for differential expression analysis

Mulvey, J.

2026-07-10 bioinformatics 10.64898/2026.07.06.736732 medRxiv

Top 0.4%

0.8%

Show abstract

pylimma is a faithful Python port of limma, intended to bring one of the most widely used tools for differential expression analysis to the developing Python ecosystem for transcriptomics and proteomics. We validated pylimma against the existing R implementation through 227 function-level comparisons and across six real world datasets spanning microarray, RNAseq, proteomics and single-cell transcriptomics. pylimma reproduces limmas numerical output to a median agreement of 13 significant figures and calls identical sets of differentially expressed features and gene sets. This supports its use as a drop-in replacement for the R implementation.

12

synpact: accurate, memory-light PacBio HiFi read mapping via a hierarchy of locally-consistent syncmer blocks

Aydin, M. S.; Sahlin, K.

2026-07-02 bioinformatics 10.64898/2026.06.28.735066 medRxiv

Top 0.4%

0.8%

Show abstract

Motivation: Mapping PacBio HiFi reads is a routine task and serves as a central step in many bioinformatics analyses. However, the most accurate long-read mappers have a high memory consumption and are slow. Some light-weight mappers have been proposed for faster runtime, but their accuracy is not comparable to state-of-the-art mappers. With the increasing number of available reference sequences, memory-efficient and fast methods for read mapping without the large accuracy drop are desired. A general trade-off with seed-chain-extend mappers is selecting a single, fixed seed size, which forces a compromise between sensitivity and specificity. Results: We present synpact, a long-read mapper that uses several seed sizes (a hierarchy) constructed with Locally Consistent Parsing (LCP) over syncmers. A read is mapped by querying for matches at different levels, followed by sliding window voting. By storing only the coarse upper levels rather than the full hierarchy, the index holds several times fewer entries, while still handling errors by falling back from coarser to finer stored levels at query time. We benchmark synpact against popular long-read mappers on four genomes and different read lengths. For simulated PacBio HiFi data, synpact matches or approaches minimap2 accuracy with higher precision in most cases, while using roughly 5-13 times less peak memory (e.g., about 0.8GB vs. 10.7GB on human) and mapping faster on large or repetitive genomes (e.g., about 10 to 13 times faster than minimap2 on rye). On real HiFi reads synpact has high concordance with minimap2 across the four genomes, as opposed to the other lightweight long-read mappers. Availability and Implementation: synpact is written in Rust and is available at https://github.com/mahmudsami/synpact

13

Client-server interfaces enable efficient agent-driven variant calling

Yu, X.; Zheng, Z.; CHEN, L.; QIn, Z.; Guo, X.; He, M.; Luo, R.

2026-06-28 bioinformatics 10.64898/2026.06.25.734665 medRxiv

Top 0.5%

0.6%

Show abstract

BackgroundLarge language model (LLM) agents increasingly automate bioinformatics analyses, but most existing bioinformatics tools were built for standalone use by human experts. An agent driving such a tool must reason about its installation, configuration, and execution from documentation for human, spending many turns, tokens, and tool calls per result. How a method is exposed to an agent can therefore matter as much as the method itself. By designing agentic interfaces for these tools, agent can reduce such overhead and improve the reliability of agent-driven analyses. FindingsTo test this design, we re-architected Clair3, a widely used deep-learning-based long-read variant caller, into a client-server system, Clair3-Connect. The client performs all genomics related processing and holds the identifiable data. The server runs only neural-network inference, and the client sends only feature tensors to the server, while sample identifiers and genomic context remain on the client. The client exposes schema-defined agent-facing tools that an agent invokes through single structured calls. On an APOE diplotyping task, all 60 agent runs were correct. The agentic tools used 12K tokens in 3 turns, 6.8 to 14 times fewer tokens than the shell-driven baselines (81K-163K tokens), at about a quarter the wall-clock time and far more stably (4% versus 35% token usage variation). Dropping the pileup and phasing stages to keep the client light left SNP F1 within 0.1-0.3 points of standard Clair3 by 50x coverage, while mutual TLS and AES-256-GCM encryption added 7.2% to end-to-end runtime. ConclusionsRecasting an established algorithm as developer-built, agentic tools behind a secure client-server boundary makes it more efficient, reliable, and easier to deploy for an LLM agent than a third-party wrapper, which cannot recover the defaults and conventions only its developers know. Agentic interfaces should be a first-class deliverable of bioinformatics tool development.

14

Uncertainty-aware quantitative analysis of the structure and dynamics of T cell receptor repertoires

Kitanovski, S.; Wollek, K.; Hoffmann, D.

2026-07-01 immunology 10.64898/2026.06.28.735097 medRxiv

Top 0.5%

0.6%

Show abstract

Diversity and dynamics of immune cell receptor repertoires (IRRs) are two factors at the functional heart of adaptive immunity that together make IRRs difficult to grasp. Moreover, measurements are compounded by various sources of experimental noise. Here we propose a computational framework (ClustIRR) for uncertainty-aware quantitative analysis of IRR structure and dynamics. ClustIRR maps multiple IRRs across replicates, time points, or conditions onto a joint graph induced by immune receptor sequence similarity. It then detects \textit{communities on the joint graph} (CJs). Based on CJs as reference structures across IRRs, ClustIRR then performs quantitative Bayesian analyses of differential CJ occupancy. Additionally, ClustIRR integrates single-cell gene expression data to link community expansion with transcriptional activation signatures. We demonstrate the capabilities of ClustIRR with the joint analysis of multiple T cell receptor repertoires in several example applications: (1) quantitative changes due to antigen challenge, (2) longitudinal dynamics during cancer immunotherapy, (3) V(D)J recombination biases in human vs murine repertoires that pre-adapt IRRs for pathogen responses. ClustIRR is freely available as open source software from the bioconductor repository.

15

LocusBlend: Flexible multi-index regional visualization of genomic association signals

yang, c.; Cook, N.; Zeng, Y.; Fu, T.; budde, J.; Cruchaga, C.; Belloy, M. E.

2026-07-21 genetic and genomic medicine 10.64898/2026.07.15.26358129 medRxiv

Top 0.5%

0.5%

Show abstract

Summary It has become standard practice to visualize regional signals from genomewide association studies GWAS using LocusZoom plots Similarly GWAS signals are compared to regionally matched quantitative trait loci QTLs ie varianttogene regulation data using LocusCompare plots to aid assessment of candidate traitrelated genes Despite broad usage these tools annotate variants by linkage disequilibrium LD to a single lead or index variant This singleindex representation has limitations for visualizing complex loci that contain multiple independent signals We present LocusBlend an interactive web application for multiindex LDblended visualization of genomic loci LocusBlend supports one or two genomic association summarystatistic datasets and one to three index variants multiindex LocusZoom colorblended plots and matching LocusCompare visualizations Applications to Alzheimers disease GWAS and QTL signals illustrate LocusBlend enables visualization and separation of independent signals despite shared LD and high genomic complexity Overall LocusBlend is aimed at supporting researchers handle the continuously expanding complexity of human genomics findings Availability and Implementation LocusBlend is freely available at httpslocusblendwustledu Publication ready plots are generated in 1min Source code documentation example datasets input templates and reproducibility instructions are available at httpsgithubcomBelloyLabLocusBlend LocusBlend is implemented in Python using Streamlit Plotly and PLINK Supplementary Information Supplementary data are available online

16

GenoSim: A Forward-Time Genotype Simulator for Clinical and Population Genetics with Population Stratification

Bakar, A.; Gul, R.; Haq, W. u.; Afghani, T.

2026-06-25 bioinformatics 10.64898/2026.06.20.733503 medRxiv

Top 0.5%

0.5%

Show abstract

Motivation: Next-generation sequencing studies in clinical genetics are often limited by the scarcity of human genotype data, which stems from ethical, regulatory, and economic barriers. The shortfall is sharpest in consanguineous populations, which are common in South Asia and the Middle East, where family-based designs need large pedigrees that are rarely sequenced in full. Existing simulators do not combine pedigree-aware propagation, realistic population stratification, and clinical export formats in one tool. Results: We present GenoSim, an R package for forward-time simulation of diploid SNP genotypes. It runs in two modes: a population mode implementing inbreeding-adjusted Hardy-Weinberg sampling, Wright-Fisher drift, directional selection, recurrent mutation, and Haldane recombination across multiple generations; and a pedigree-constrained mode that ingests real family VCFs and a pedigree, reconstructs phase where the pedigree makes it identifiable, propagates genotypes through the observed family structure, and appends synthetic generations. Version 1.1.1 adds population stratification through the Balding-Nichols model parameterised by gnomAD v3.1 fixation indices (F_ST) for eight ancestry groups (AFR, AMR, EAS, EUR, FIN, MID, SAS, ASJ), empirical allele-frequency loading from external reference panels, and admixed-cohort simulation. Analysis functions cover Hardy-Weinberg testing, linkage disequilibrium, runs of homozygosity, principal component analysis, founder-referenced and between-generation F-statistics, and Nei gene diversity. Availability and implementation: GenoSim is available as an R package at https://github.com/malikbak/GenoSim under the MIT licence. It requires R [≥] 4.0.0 and depends only on base R packages (stats, utils, graphics, grDevices, tools).

17

Evolution of mutation rates in digital genomes: the roles of genetic drift, mutational supply, and genome size

Fernandez de Grado, Q.; Frenoy, A.

2026-07-03 evolutionary biology 10.64898/2026.07.03.736272 medRxiv

Top 0.6%

0.5%

Show abstract

Mutation is the ultimate mechanism that produces genetic novelty, and thus a central ingredient of evolution. Mutation rates are therefore thought to be tuned by natural selection, for example to optimize a delicate balance between the generation of adaptive diversity and the accumulation of deleterious mutations. As this selection occurs over very long time scales, models and simulations have been powerful tools to understand how mutation rate evolves and which factors influence it. Most simulation methods are nevertheless limited by the over-simplicity of the genotype-to-phenotype map they feature, especially regarding the encoding of mutation rate. We modified Aevol, an evolutionary simulator inspired by bacterial genomics with a realistic genome structure and a complex genotype-to-phenotype layer, to allow organisms to evolve genes coding for higher replication fidelity. This setup permits several degrees of realism absent in other models: mutation-rate modifier genes themselves experience a realistic distribution of effects of mutations and diminishing- returns epistasis, similarly to fitness modifiers. Moreover, a lower mutation rate comes with the trade-off of a larger genome to encode the genes improving replication fidelity. We use this setup to test hypotheses regarding the evolution of prokaryotic mutation rate, and its link with genome size and genetic drift. We found that evolution systematically increases replication fidelity, even when this results in lower fitness. We highlight two factors which limit the mutation rate decrease: genetic drift and the supply of gain-of-fidelity mutations.

18

Griphus Software for Multi Panel Figure Composition and Experimentation with Emphasis on Taxonomy

Aguiar, A. P.

2026-07-11 zoology 10.64898/2026.07.07.736512 medRxiv

Top 0.6%

0.4%

Show abstract

The preparation of multi panel figures remains a labor intensive step in scientific publication. Albeit there are specific tools available to solve this problem, they are often highly specialized, difficult to install, or time consuming to learn. Griphus is a standalone graphical application designed for rapid composition and experimentation with multi panel figures, developed by and for zoological taxonomists. Functions specifically designed for multi panel composition include automatic figure numbering and placement, aspect ratio operations, spacers, layout rotation, layout suggestions, and automatic generation of figure legends, including scale bar descriptions. The software can perform both spatial interpretation of images on the canvas and work with a simple, editable layout formula. It also enables instant multi panel composition, with numbered images and automatic contrast selection for the numbers, obtained simply by loading images. User defined parameters such as target printable dimensions, resolution, spacing, and color mode are preserved throughout the work. The program produces coordinated outputs consisting of the final composite figure, a readable file describing the layout structure, and a .gri file storing images, transformations, and parameters for exact regeneration. Griphus is intended as a complementary tool to professional image software, providing a simple and efficient environment for constructing high quality multi panel figures.

19

Nemo2.4: fast and accurate quantitative genetics forward-time simulations

Guillaume, F.; Cotto, O.; Chebib, J.; Beeravolu Reddy, C.; Schmid, M.

2026-07-08 evolutionary biology 10.64898/2026.07.02.736177 medRxiv

Top 0.6%

0.4%

Show abstract

We present Nemo 2.4, an advanced forward-time individual-based simulation framework designed to model the complex eco-evolutionary dynamics and genetic basis of quantitative traits. This tool addresses current challenges in evolutionary quantitative genetics by providing unprecedented flexibility and computational efficiency. Nemo 2.4's modular architecture allows researchers to design custom life cycles by combining specialized Life Cycle Event (LCE) modules, from reproduction and dispersal to selection, crossing, and phenotype expression. The software supports diverse population models, including both Wright-Fisher (WF) and non-WF dynamics, spatially explicit models, and varying demography. Nemo 2.4 handles a wide range of genetic architectures, including both multi-allelic Quantitative Trait Loci (QTL) for general trait studies, and dense di-allelic Quantitative Trait Nucleotides (QTN) implemented with highly optimized bit-wise data structures. Crucially, it allows the simulation of QTNs on comprehensive genetic maps that incorporate other genetic elements, providing genomic-scale resolution. Key biological complexities are integrated natively: the model accommodates modular pleiotropy, dominance, and pairwise epistasis across multiple traits, facilitating the study of complex genotype-phenotype mappings. Furthermore, Nemo 2.4 models phenotypic plasticity through reaction norms and incorporates underlying liability thresholds, enabling the simulation of environmental influences on trait evolution with various forms of selection (e.g., Gaussian, linear, truncation). Due to its compiled design and memory-efficient data representations for large numbers of loci, Nemo provides a robust platform for running high-throughput simulations critical for testing theoretical predictions in polygenic adaptation and understanding evolutionary responses to changing environments.

20

zsasa: a Zig-based engine for high-throughput solvent accessible surface area at proteome scale

Nagae, T.; Tomii, K.

2026-07-03 bioinformatics 10.64898/2026.06.29.733683 medRxiv

Top 0.6%

0.4%

Show abstract

Solvent accessible surface area (SASA) is widely used to describe protein stability, ligand binding, mutation effects, and protein-protein interfaces. As structural biology workloads expand to predicted-structure collections, trajectories, and large assemblies, SASA tools must combine reproducible calculation with high throughput, low memory use, and workflow-friendly input handling. We present zsasa, a Zig-based SASA engine with command-line and Python interfaces. zsasa implements the established Shrake-Rupley and Lee-Richards algorithms, provides exact f64/f32 modes and an optional bitmask approximation, and supports batch and trajectory workflows, compressed structure inputs, and configurable atom classification including Chemical Component Dictionary (CCD)-based radii for non-standard components. In matched Shrake-Rupley validation on 4,370 Escherichia coli AlphaFold Database structures, exact double-precision zsasa reproduced FreeSASA total SASA values to near numerical identity. In 10-thread batch benchmarks on the E. coli and 23,586-structure human AlphaFold collections, zsasa was 2.94x faster than a FreeSASA batch wrapper in exact f64 mode and up to 9.70x faster in bitmask mode, with roughly 4-8x lower peak memory. Trajectory benchmarks exceeded 1,000 frames/s at tens of megabytes of peak memory, and a 4.5-million-atom PDB stress-test file completed in under five seconds. These results support zsasa as a practical tool for reproducible, low-memory generation of surface-derived structural features at large scale. zsasa is available under the MIT License at https://github.com/N283T/zsasa.